List of Flash News about METR benchmarks
| Time | Details |
|---|---|
|
2026-01-15 22:18 |
Anthropic API Data: Claude 50% Success on 3.5-Hour Tasks, Highly Reliable on Longer Horizons vs METR Benchmarks
According to @AnthropicAI, API data shows Claude achieves a 50% success rate on 3.5-hour tasks and is highly reliable on longer task horizons. Source: @AnthropicAI on X, Jan 15, 2026. According to @AnthropicAI, these task horizons are longer than METR benchmarks but are fundamentally different because users can iteratively work toward success on tasks Claude is known to perform well. Source: @AnthropicAI on X, Jan 15, 2026. According to @AnthropicAI, the post links to additional context at https://t.co/RxKnLNMEYj and includes an image, while providing no mention of crypto assets, tokens, pricing, partnerships, or release timelines. Source: @AnthropicAI on X, Jan 15, 2026. According to @AnthropicAI, the post does not reference crypto markets, implying no direct crypto-market linkage is provided in this update. Source: @AnthropicAI on X, Jan 15, 2026. |